Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 41
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Mol Inform ; 43(1): e202300190, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-37885368

RESUMO

GUIDEMOL is a Python computer program based on the RDKit software to process molecular structures and calculate molecular descriptors with a graphical user interface using the tkinter package. It can calculate descriptors already implemented in RDKit as well as grid representations of 3D molecular structures using the electrostatic potential or voxels. The GUIDEMOL app provides easy access to RDKit tools for chemoinformatics users with no programming skills and can be adapted to calculate other descriptors or to trigger other procedures. A command line interface (CLI) is also provided for the calculation of grid representations. The source code is available at https://github.com/jairesdesousa/guidemol.


Assuntos
Quimioinformática , Software , Proteínas Adaptadoras de Transdução de Sinal
2.
Mol Inform ; 42(1): e2200193, 2023 01.
Artigo em Inglês | MEDLINE | ID: mdl-36167940

RESUMO

Random Forest (RF) QSPR models were developed with a data set of homolytic bond dissociation energies (BDE) previously calculated by B3LYP/6-311++G(d,p)//DFTB for 2263 sp3C-H covalent bonds. The best set of attributes consisted in 114 descriptors of the carbon atom (counts of atom types in 5 spheres around the kernel atom and ring descriptors). The optimized model predicted the DFT-calculated BDE of an independent test set of 224 bonds with MAE=2.86 kcal/mol. A new data set of 409 bonds from the iBonD database (http://ibond.nankai.edu.cn) was predicted by the RF with a modest MAE (5.36 kcal/mol) but a relatively high R2 (0.75) against experimental energies. A prediction scheme was explored that corrects the RF prediction with the average deviation observed for the k nearest neighbours (KNN) in an additional memory of experimental data. The corrected predictions achieved MAE=2.22 kcal/mol for an independent test set of 145 bonds and the corresponding experimental bond energies.


Assuntos
Aprendizado de Máquina , Termodinâmica , Calibragem
3.
Chemphyschem ; 23(24): e202200300, 2022 12 16.
Artigo em Inglês | MEDLINE | ID: mdl-35929613

RESUMO

Machine-learning models were developed to predict the composition profile of a three-compound mixture in liquid-liquid equilibrium (LLE), given the global composition at certain temperature and pressure. A chemoinformatics approach was explored, based on the MOLMAP technology to encode molecules and mixtures. The chemical systems involved an ionic liquid (IL) and two organic molecules. Two complementary models have been optimized for the IL-rich and IL-poor phases. The two global optimized models are highly accurate, and were validated with independent test sets, where combinations of molecule1+molecule2+IL are different from those in the training set. These results highlight the MOLMAP encoding scheme, based on atomic properties to train models that learn relationships between features of complex multi-component chemical systems and their profile of phase compositions.


Assuntos
Quimioinformática , Líquidos Iônicos , Líquidos Iônicos/química , Temperatura
4.
Sci Rep ; 11(1): 23720, 2021 12 09.
Artigo em Inglês | MEDLINE | ID: mdl-34887473

RESUMO

Machine learning (ML) algorithms were explored for the classification of the UV-Vis absorption spectrum of organic molecules based on molecular descriptors and fingerprints generated from 2D chemical structures. Training and test data (~ 75 k molecules and associated UV-Vis data) were assembled from a database with lists of experimental absorption maxima. They were labeled with positive class (related to photoreactive potential) if an absorption maximum is reported in the range between 290 and 700 nm (UV/Vis) with molar extinction coefficient (MEC) above 1000 Lmol-1 cm-1, and as negative if no such a peak is in the list. Random forests were selected among several algorithms. The models were validated with two external test sets comprising 998 organic molecules, obtaining a global accuracy up to 0.89, sensitivity of 0.90 and specificity of 0.88. The ML output (UV-Vis spectrum class) was explored as a predictor of the 3T3 NRU phototoxicity in vitro assay for a set of 43 molecules. Comparable results were observed with the classification directly based on experimental UV-Vis data in the same format.

5.
Eur J Med Chem ; 210: 112985, 2021 Jan 15.
Artigo em Inglês | MEDLINE | ID: mdl-33189435

RESUMO

Aiming at generating a series of monoterpene indole alkaloids with enhanced multidrug resistance (MDR) reversing activity in cancer, two major epimeric alkaloids isolated from Tabernaemontana elegans, tabernaemontanine (1) and dregamine (2), were derivatized by alkylation of the indole nitrogen. Twenty-six new derivatives (3-28) were prepared by reaction with different aliphatic and aromatic halides, whose structures were elucidated mainly by NMR, including 2D NMR experiments. Their MDR reversal ability was evaluated through a functional assay, using as models resistant human colon adenocarcinoma and human ABCB1-gene transfected L5178Y mouse lymphoma cells, overexpressing P-glycoprotein (P-gp), by flow cytometry. A considerable increase of activity was found for most of the derivatives, being the strongest P-gp inhibitors those sharing N-phenethyl moieties, displaying outstanding inhibitory activity, associated with weak cytotoxicity. Chemosensitivity assays were also performed in a model of combination chemotherapy in the same cell lines, by studying the in vitro interactions between the compounds and the antineoplastic drug doxorubicin. Most of the compounds have shown strong synergistic interactions with doxorubicin, highlighting their potential as MDR reversers. QSAR models were also explored for insights on drug-receptor interaction, and it was found that lipophilicity and bulkiness features were associated with inhibitory activity, although linear correlations were not observed.


Assuntos
Membro 1 da Subfamília B de Cassetes de Ligação de ATP/antagonistas & inibidores , Antineoplásicos/farmacologia , Alcaloides Indólicos/farmacologia , Alquilação , Animais , Antineoplásicos/síntese química , Antineoplásicos/química , Proliferação de Células/efeitos dos fármacos , Relação Dose-Resposta a Droga , Ensaios de Seleção de Medicamentos Antitumorais , Alcaloides Indólicos/síntese química , Alcaloides Indólicos/química , Camundongos , Estrutura Molecular , Relação Quantitativa Estrutura-Atividade , Células Tumorais Cultivadas
6.
J Chem Inf Model ; 61(1): 67-75, 2021 01 25.
Artigo em Inglês | MEDLINE | ID: mdl-33350814

RESUMO

In this study, machine learning algorithms were investigated for the classification of organic molecules with one carbon chiral center according to the sign of optical rotation. Diverse heterogeneous data sets comprising up to 13,080 compounds and their corresponding optical rotation were retrieved from Reaxys and processed independently for three solvents: dichloromethane, chloroform, and methanol. The molecular structures were represented by chiral descriptors based on the physicochemical and topological properties of ligands attached to the chiral center. The sign of optical rotation was predicted by random forests (RF) and artificial neural networks for independent test sets with an accuracy of up to 75% for dichloromethane, 82% for chloroform, and 82% for methanol. RF probabilities and the availability of structures in the training set with the same spheres of atom types around the chiral center defined applicability domains in which the accuracy is higher.


Assuntos
Aprendizado de Máquina , Redes Neurais de Computação , Algoritmos , Estrutura Molecular , Rotação Ocular , Estereoisomerismo
7.
Mol Inform ; 39(9): e2000001, 2020 09.
Artigo em Inglês | MEDLINE | ID: mdl-32469147

RESUMO

The increasing application of new ionic liquids (IL) creates the need of liquid-liquid equilibria data for both miscible and quasi-immiscible systems. In this study, equilibrium concentrations at different temperatures for ionic liquid+water two-phase systems were modeled using a Quantitative-Structure-Property Relationship (QSPR) method. Data on equilibrium concentrations were taken from the ILThermo Ionic Liquids database, curated and used to make models that predict the weight fraction of water in ionic liquid rich phase and ionic liquid in the aqueous phase as two separate properties. The major modeling challenge stems from the fact that each single IL is characterized by several data points, since equilibrium concentrations are temperature dependent. Thus, new approaches for the detection of potential data point outliers, testing set selection, and quality prediction have been developed. Training set comprised equilibrium concentration data for 67 and 68 ILs in case of water in IL and IL in water modeling, respectively. SiRMS, MOLMAPS, Rcdk and Chemaxon descriptors were used to build Random Forest models for both properties. Models were subjected to the Y-scrambling test for robustness assessment. The best models have also been validated using an external test set that is not part of the ILThermo database. A two-phase equilibrium diagram for one of the external test set IL is presented for better visualization of the results and potential derivation of tie lines.


Assuntos
Líquidos Iônicos/química , Modelos Químicos , Relação Quantitativa Estrutura-Atividade , Água/química , Curadoria de Dados , Conjuntos de Dados como Assunto , Concentração Osmolar , Pressão , Temperatura
8.
Spectrochim Acta A Mol Biomol Spectrosc ; 223: 117289, 2019 Dec 05.
Artigo em Inglês | MEDLINE | ID: mdl-31255865

RESUMO

A chemoinformatics method was applied to the assignment of absolute configurations and to the quantitative prediction of specific optical rotations using a data set of 88 chiral fluorinated molecules (44 pairs of enantiomers). Counterpropagation neural networks were explored for the classification of enantiomers as dextrorotatory or levorotatory. Regression models were trained using multilayer perceptrons (MLP), random forests (RF) or multilinear regressions (MLR), on the basis of physicochemical atomic stereo (PAS) descriptors. New descriptors were also derived considering the common structural features of the data set (cPAS descriptors), which enabled RF models to predict the whole data set with R = 0.964, mean absolute error (MAE) of 9.8° and root mean square error (RMSE) of 12.5° in leave-one-pair-out cross-validation experiments. The predictions for the 30 compounds measured in chloroform were obtained with R = 0.971, MAE = 9.1° and RMSE = 12.5°, which compares favorably with quantum chemistry calculations reported in the literature.

9.
Molecules ; 23(11)2018 Nov 18.
Artigo em Inglês | MEDLINE | ID: mdl-30453681

RESUMO

A series of π-conjugated molecules, based on pyridazine and thiophene heterocycles 3a⁻e, were synthesized using commercially, or readily available, coupling components, through a palladium catalyzed Suzuki-Miyaura cross-coupling reaction. The electron-deficient pyridazine heterocycle was functionalized by a thiophene electron-rich heterocycle at position six, and different (hetero)aromatic moieties (phenyl, thienyl, furanyl) were functionalized with electron acceptor groups at position three. Density Functional Theory (DFT) calculations were carried out to obtain information on the conformation, electronic structure, electron distribution, dipolar moment, and molecular nonlinear response of the synthesized push-pull pyridazine derivatives. Hyper-Rayleigh scattering in 1,4-dioxane solutions, using a fundamental wavelength of 1064 nm, was used to evaluate their second-order nonlinear optical properties. The thienylpyridazine functionalized with the cyano-phenyl moiety exhibited the largest first hyperpolarizability (ß = 175 × 10-30 esu, using the T convention) indicating its potential as a second harmonic generation (SHG) chromophore.


Assuntos
Modelos Teóricos , Acoplamento Oxidativo , Piridazinas/síntese química , Piridazinas/química , Piridazinas/farmacologia , Análise Espectral
10.
J Cheminform ; 10(1): 43, 2018 Aug 22.
Artigo em Inglês | MEDLINE | ID: mdl-30136001

RESUMO

Machine learning (ML) algorithms were explored for the fast estimation of molecular dipole moments calculated by density functional theory (DFT) by B3LYP/6-31G(d,p) on the basis of molecular descriptors generated from DFT-optimized geometries and partial atomic charges obtained by empirical or ML schemes. A database was used with 10,071 structures, new molecular descriptors were designed and the models were validated with external test sets. Several ML algorithms were screened. Random forest regression models predicted an external test set of 3368 compounds achieving mean absolute error up to 0.44 D. The results represent a significant improvement of the dipole moments calculated using empirical point charges located at the nucleus, even assuming the DFT-optimized geometry (root mean square error, RMSE, of 0.68 D vs. 1.53 D and R2 = 0.87 vs. 0.66).

11.
Mar Drugs ; 16(7)2018 Jul 13.
Artigo em Inglês | MEDLINE | ID: mdl-30011882

RESUMO

Computational methodologies are assisting the exploration of marine natural products (MNPs) to make the discovery of new leads more efficient, to repurpose known MNPs, to target new metabolites on the basis of genome analysis, to reveal mechanisms of action, and to optimize leads. In silico efforts in drug discovery of NPs have mainly focused on two tasks: dereplication and prediction of bioactivities. The exploration of new chemical spaces and the application of predicted spectral data must be included in new approaches to select species, extracts, and growth conditions with maximum probabilities of medicinal chemistry novelty. In this review, the most relevant current computational dereplication methodologies are highlighted. Structure-based (SB) and ligand-based (LB) chemoinformatics approaches have become essential tools for the virtual screening of NPs either in small datasets of isolated compounds or in large-scale databases. The most common LB techniques include Quantitative Structure⁻Activity Relationships (QSAR), estimation of drug likeness, prediction of adsorption, distribution, metabolism, excretion, and toxicity (ADMET) properties, similarity searching, and pharmacophore identification. Analogously, molecular dynamics, docking and binding cavity analysis have been used in SB approaches. Their significance and achievements are the main focus of this review.


Assuntos
Organismos Aquáticos , Produtos Biológicos/química , Biologia Computacional/métodos , Descoberta de Drogas/métodos , Modelos Biológicos , Produtos Biológicos/farmacologia , Química Farmacêutica/métodos , Desenho de Fármacos , Modelos Químicos , Modelos Moleculares , Estrutura Molecular , Relação Quantitativa Estrutura-Atividade
12.
Bioinformatics ; 34(1): 120-121, 2018 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-28968640

RESUMO

Summary: The representation of metabolic reactions strongly relies on visualization, which is a major barrier for blind users. The NavMol software renders the communication and interpretation of molecular structures and reactions accessible by integrating chemoinformatics and assistive technology. NavMol 3.0 provides a molecular editor for metabolic reactions. The user can start with templates of reactions and build from such cores. Atom-to-atom mapping enables changes in the reactants to be reflected in the products (and vice-versa) and the reaction centres to be automatically identified. Blind users can easily interact with the software using the keyboard and text-to-speech technology. Availability and implementation: NavMol 3.0 is free and open source under the GNU general public license (GPLv3), and can be downloaded at http://sourceforge.net/projects/navmol as a JAR file. Contact: joao@airesdesousa.com.


Assuntos
Cegueira , Redes e Vias Metabólicas , Auxiliares Sensoriais , Software , Humanos
13.
Med Chem ; 13(5): 439-447, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28185538

RESUMO

BACKGROUND: Tuberculosis (TB) is the second leading cause of mortality worldwide being a highly contagious and insidious illness caused by Mycobacterium tuberculosis, Mtb. Additionally, the emergence of multidrug-resistant and extensively drug-resistant strains of Mtb, together with significant levels of co-infection with HIV and TB (HIV/TB) make the search for new antitubercular drugs urgent and challenging. METHODS: This work was based on the hypothesis that an active compound could be obtained if substituents present in some other active compounds were attached on a core of an important structure, in this case the indole scaffold, thus generating a hybrid compound. A QSAR-oriented design based on classification and regression models along with the estimation of physicochemical and biological properties have also been used to assist in the selection of compounds. Chosen compounds were synthesized using various synthetic procedures and evaluated against M. tuberculosis H37Rv strain. RESULTS: Selected compounds possess substituents at positions C5, C2 and N1 of the indole ring. The substituents involve p-halophenyl, pyridyl, benzyloxy and benzylamine groups. Four compounds were synthesised using suitable synthetic procedures to attain the desired substitution at the indole core. From these, three compounds are new and have been fully characterized, and tested in vitro against the H37Rv ATCC27294T Mtb strain, using isoniazid as a control. One of them, compound 2, with the pyridyl group at N1, has an experimental log (1/MIC) very close to 5 and can be considered as being (weakly) active. In fact, it is more active than 64% of all indole molecules in our data sets of experimental results from literature. The most active indole in this data sets has log (1/MIC)=5.93 with only 6 compounds with log (1/MIC) above 5.5. CONCLUSION: Despite the lower activity found for the tested compounds, when compared to other reported indole-derivatives, these structures, which rely on a hybrid design concept, may constitute interesting scaffolds to prepare a new family of TB inhibitors with improved activity.


Assuntos
Antituberculosos/farmacologia , Indóis/farmacologia , Piridinas/farmacologia , Antituberculosos/síntese química , Desenho de Fármacos , Indóis/síntese química , Isoniazida/farmacologia , Aprendizado de Máquina , Mycobacterium tuberculosis/efeitos dos fármacos , Redes Neurais de Computação , Piridinas/síntese química , Relação Quantitativa Estrutura-Atividade
14.
J Chem Inf Model ; 57(1): 11-21, 2017 01 23.
Artigo em Inglês | MEDLINE | ID: mdl-28033004

RESUMO

Machine learning algorithms were explored for the fast estimation of HOMO and LUMO orbital energies calculated by DFT B3LYP, on the basis of molecular descriptors exclusively based on connectivity. The whole project involved the retrieval and generation of molecular structures, quantum chemical calculations for a database with >111 000 structures, development of new molecular descriptors, and training/validation of machine learning models. Several machine learning algorithms were screened, and an applicability domain was defined based on Euclidean distances to the training set. Random forest models predicted an external test set of 9989 compounds achieving mean absolute error (MAE) up to 0.15 and 0.16 eV for the HOMO and LUMO orbitals, respectively. The impact of the quantum chemical calculation protocol was assessed with a subset of compounds. Inclusion of the orbital energy calculated by PM7 as an additional descriptor significantly improved the quality of estimations (reducing the MAE in >30%).


Assuntos
Aprendizado de Máquina , Teoria Quântica
15.
Mol Inform ; 35(2): 62-9, 2016 02.
Artigo em Inglês | MEDLINE | ID: mdl-27491791

RESUMO

To enable the fast estimation of atom condensed Fukui functions, machine learning algorithms were trained with databases of DFT pre-calculated values for ca. 23,000 atoms in organic molecules. The problem was approached as the ranking of atom types with the Bradley-Terry (BT) model, and as the regression of the Fukui function. Random Forests (RF) were trained to predict the condensed Fukui function, to rank atoms in a molecule, and to classify atoms as high/low Fukui function. Atomic descriptors were based on counts of atom types in spheres around the kernel atom. The BT coefficients assigned to atom types enabled the identification (93-94 % accuracy) of the atom with the highest Fukui function in pairs of atoms in the same molecule with differences ≥0.1. In whole molecules, the atom with the top Fukui function could be recognized in ca. 50 % of the cases and, on the average, about 3 of the top 4 atoms could be recognized in a shortlist of 4. Regression RF yielded predictions for test sets with R(2) =0.68-0.69, improving the ability of BT coefficients to rank atoms in a molecule. Atom classification (as high/low Fukui function) was obtained with RF with sensitivity of 55-61 % and specificity of 94-95 %.


Assuntos
Aprendizado de Máquina , Modelos Químicos , Relação Quantitativa Estrutura-Atividade
16.
Eur J Med Chem ; 81: 119-38, 2014 Jun 23.
Artigo em Inglês | MEDLINE | ID: mdl-24836065

RESUMO

The disturbing emergence of multidrug-resistant strains of Mycobacterium tuberculosis (Mtb) has been driving the scientific community to urgently search for new and efficient antitubercular drugs. Despite the various drugs currently under evaluation, isoniazid is still the key and most effective component in all multi-therapeutic regimens recommended by the WHO. This paper describes the QSAR-oriented design, synthesis and in vitro antitubercular activity of several potent isoniazid derivatives (isonicotinoyl hydrazones and isonicotinoyl hydrazides) against H37Rv and two resistant Mtb strains. QSAR studies entailed RFs and ASNNs classification models, as well as MLR models. Strict validation procedures were used to guarantee the models' robustness and predictive ability. Lipophilicity was shown not to be relevant to explain the activity of these derivatives, whereas shorter N-N distances and lengthy substituents lead to more active compounds. Compounds 1, 2, 4, 5 and 6, showed measured activities against H37Rv higher than INH (i.e., MIC ≤ 0.28 µM), while compound 9 exhibited a six fold decrease in MIC against the katG (S315T) mutated strain, by comparison with INH (i.e., 6.9 vs. 43.8 µM). All compounds were ineffective against H37RvINH (ΔkatG), a strain with a full deletion of the katG gene, thus corroborating the importance of KatG in the activation of INH-based compounds. The most potent compounds were also shown not to be cytotoxic up to a concentration 500 times higher than MIC.


Assuntos
Antituberculosos/farmacologia , Desenho de Fármacos , Isoniazida/análogos & derivados , Isoniazida/farmacologia , Mycobacterium tuberculosis/efeitos dos fármacos , Animais , Antituberculosos/síntese química , Antituberculosos/química , Chlorocebus aethiops , Cristalografia por Raios X , Relação Dose-Resposta a Droga , Isoniazida/química , Testes de Sensibilidade Microbiana , Modelos Moleculares , Estrutura Molecular , Relação Estrutura-Atividade , Células Vero
17.
PLoS One ; 9(2): e88499, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24551112

RESUMO

The combination of chemoinformatics approaches with NMR techniques and the increasing availability of data allow the resolution of problems far beyond the original application of NMR in structure elucidation/verification. The diversity of applications can range from process monitoring, metabolic profiling, authentication of products, to quality control. An application related to the automatic analysis of complex mixtures concerns mixtures of chemical reactions. We encoded mixtures of chemical reactions with the difference between the (1)H NMR spectra of the products and the reactants. All the signals arising from all the reactants of the co-occurring reactions were taken together (a simulated spectrum of the mixture of reactants) and the same was done for products. The difference spectrum is taken as the representation of the mixture of chemical reactions. A data set of 181 chemical reactions was used, each reaction manually assigned to one of 6 types. From this dataset, we simulated mixtures where two reactions of different types would occur simultaneously. Automatic learning methods were trained to classify the reactions occurring in a mixture from the (1)H NMR-based descriptor of the mixture. Unsupervised learning methods (self-organizing maps) produced a reasonable clustering of the mixtures by reaction type, and allowed the correct classification of 80% and 63% of the mixtures in two independent test sets of different similarity to the training set. With random forests (RF), the percentage of correct classifications was increased to 99% and 80% for the same test sets. The RF probability associated to the predictions yielded a robust indication of their reliability. This study demonstrates the possibility of applying machine learning methods to automatically identify types of co-occurring chemical reactions from NMR data. Using no explicit structural information about the reactions participants, reaction elucidation is performed without structure elucidation of the molecules in the mixtures.


Assuntos
Algoritmos , Inteligência Artificial , Espectroscopia de Ressonância Magnética/estatística & dados numéricos , Azirinas/química , Reação de Cicloadição , Cicloparafinas/química , Processos Fotoquímicos , Piridazinas/química , Reprodutibilidade dos Testes
18.
J Cheminform ; 5: 34, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23849655

RESUMO

BACKGROUND: The rapid access to intrinsic physicochemical properties of molecules is highly desired for large scale chemical data mining explorations such as mass spectrum prediction in metabolomics, toxicity risk assessment and drug discovery. Large volumes of data are being produced by quantum chemistry calculations, which provide increasing accurate estimations of several properties, e.g. by Density Functional Theory (DFT), but are still too computationally expensive for those large scale uses. This work explores the possibility of using large amounts of data generated by DFT methods for thousands of molecular structures, extracting relevant molecular properties and applying machine learning (ML) algorithms to learn from the data. Once trained, these ML models can be applied to new structures to produce ultra-fast predictions. An approach is presented for homolytic bond dissociation energy (BDE). RESULTS: Machine learning models were trained with a data set of >12,000 BDEs calculated by B3LYP/6-311++G(d,p)//DFTB. Descriptors were designed to encode atom types and connectivity in the 2D topological environment of the bonds. The best model, an Associative Neural Network (ASNN) based on 85 bond descriptors, was able to predict the BDE of 887 bonds in an independent test set (covering a range of 17.67-202.30 kcal/mol) with RMSD of 5.29 kcal/mol, mean absolute deviation of 3.35 kcal/mol, and R (2) = 0.953. The predictions were compared with semi-empirical PM6 calculations, and were found to be superior for all types of bonds in the data set, except for O-H, N-H, and N-N bonds. The B3LYP/6-311++G(d,p)//DFTB calculations can approach the higher-level calculations B3LYP/6-311++G(3df,2p)//B3LYP/6-31G(d,p) with an RMSD of 3.04 kcal/mol, which is less than the RMSD of ASNN (against both DFT methods). An experimental web service for on-line prediction of BDEs is available at http://joao.airesdesousa.com/bde. CONCLUSION: Knowledge could be automatically extracted by machine learning techniques from a data set of calculated BDEs, providing ultra-fast access to accurate estimations of DFT-calculated BDEs. This demonstrates how to extract value from large volumes of data currently being produced by quantum chemistry calculations at an increasing speed mostly without human intervention. In this way, high-level theoretical quantum calculations can be used in large-scale applications that otherwise would not afford the intrinsic computational cost.

19.
J Chem Inf Model ; 52(12): 3116-22, 2012 Dec 21.
Artigo em Inglês | MEDLINE | ID: mdl-23167287

RESUMO

Machine learning (SVM and JRip rule learner) methods have been used in conjunction with the Condensed Graph of Reaction (CGR) approach to identify errors in the atom-to-atom mapping of chemical reactions produced by an automated mapping tool by ChemAxon. The modeling has been performed on the three first enzymatic classes of metabolic reactions from the KEGG database. Each reaction has been converted into a CGR representing a pseudomolecule with conventional (single, double, aromatic, etc.) bonds and dynamic bonds characterizing chemical transformations. The ChemAxon tool was used to automatically detect the matching atom pairs in reagents and products. These automated mappings were analyzed by the human expert and classified as "correct" or "wrong". ISIDA fragment descriptors generated for CGRs for both correct and wrong mappings were used as attributes in machine learning. The learned models have been validated in n-fold cross-validation on the training set followed by a challenge to detect correct and wrong mappings within an external test set of reactions, never used for learning. Results show that both SVM and JRip models detect most of the wrongly mapped reactions. We believe that this approach could be used to identify erroneous atom-to-atom mapping performed by any automated algorithm.


Assuntos
Biologia Computacional/métodos , Máquina de Vetores de Suporte , Automação , Bases de Dados de Proteínas , Reações Falso-Positivas , Modelos Biológicos
20.
Mol Inform ; 31(2): 135-44, 2012 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-27476958

RESUMO

Metabolic pathways are at the crossroad between the chemical world of small molecules and the biological world of enzymes, genes and regulation. Methods for their processing are therefore required for a great variety of applications. The work presented here reports a new method to encode metabolic pathways and reactomes of organisms based on the MOLMAP approach. Pathways are represented from features of the metabolites involved in their reactions enabling to automatically perceive chemical similarities, and making no use of EC numbers. MOLMAP descriptors are based on atomic topological and physicochemical features of the bonds involved in reactions. The results show that self-organizing maps (SOM) can be trained with MOLMAPs of pathways to automatically recognize similarities between pathways of the same type of metabolism. The study also illustrates the possibility of applying the MOLMAP methodology at progressively higher levels of complexity, bridging chemical and biological information, and going all the way from atomic properties to the classification of organisms.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...